A new threshold selection method for species distribution models with presence‐only data: Extracting the mutation point of the P/E curve by threshold regression

Abstract Selecting thresholds to convert continuous predictions of species distribution models proves critical for many real‐world applications and model assessments. Prevalent threshold selection methods for presence‐only data require unproven pseudo‐absence data or subjective researchers' decisions. This study proposes a new method, Boyce‐Threshold Quantile Regression (BTQR), to determine thresholds objectively without pseudo‐absence data. We summarize that the mutation point is a typical shape feature of the predicted‐to‐expected (P/E) curve after reviewing relevant articles. Analysis based on source‐sink theory suggests that this mutation point may represent a transition in habitat types and serve as an appropriate threshold. Threshold regression is introduced to accurately locate the mutation point. To validate the effectiveness of BTQR, we used four virtual species of varying prevalence and a real species with reliable distribution data. Six different species distribution models were employed to generate continuous suitability predictions. BTQR and nine other traditional methods transformed these continuous outputs into binary results. Comparative experiments show that BTQR has advantages in terms of accuracy, applicability, and consistency over the existing methods.

The approaches for threshold selection are hugely limited by the available data types.There are many methods to select the threshold for presence-absence data (Liu et al., 2005).However, since determining whether a species is truly absent or simply not detected is difficult, absence data are hard to obtain and often unreliable (Della Rocca et al., 2020;Gomes et al., 2018;Guillera-Arroita et al., 2015).It is problematic to select a suitable threshold for presence-only data, as conventional confusion matrices cannot be formed (Liu et al., 2013;Rebelo & Jones, 2010).A prevalent method to choose thresholds is fixed sensitivity, as it can be conveniently calculated from presence-only data.Some common fixed values include sensitivities equal to 0.5, 0.75, and 0.9 (Se0.5, Se0.75, and Se0.9) (Bellamy et al., 2013;Ensing et al., 2013;Toth et al., 2013).
The selection of these parameters, however, is totally subjective and lacks theoretical justification (Liu et al., 2013).Other researchers use pseudo-absence data to calculate sensitivity and specificity and then select the threshold.The threshold that equalizes sensitivity and specificity is a common strategy (ESS) (Dobrowski et al., 2011;Swanson et al., 2013), as is the threshold that maximizes the sum (MSS) (Higgins et al., 2012;Vacchiano et al., 2013).Since the determination of specificity is based on pseudo-absence data rather than real absence data, the deviation between the threshold generated by these methods and the true value cannot be avoided.Another method of threshold selection is to use known traits of species data and predicted values, for example, the mean predicted probability (MeanProb) (Nenzén & Araújo, 2011).
Boyce established the predicted-expected (P/E) ratio to evaluate the accuracy of species distribution models (Boyce et al., 2002).This method assesses accuracy by calculating the ratio between the predicted frequencies of the model and the expected frequencies of the random distribution without using absence data.Hirzel pointed out that the information in the P/E curve can help to reclassify continuous output results (Hirzel et al., 2006).Some studies set the boundary of P/E = 1 as an objective threshold (Arshad et al., 2012;Witt et al., 2022), but they can only ensure that the transformed prediction results are better than the random results.There are studies suggesting that the threshold can be determined based on the shape of the curve (Folmer et al., 2016;Wang et al., 2008).Compared to setting P/E = 1, this threshold selection method utilizes more curve information but relies on the judgment of researchers to select specific points, which is arbitrary and unrepeatable.
We discovered that mutation points and slope changes are typical shape features of P/E curves after reviewing relevant articles.Then the source-sink theory was applied to demonstrate that mutation points can serve as effective thresholds for distinguishing high-quality habitats (in this article, "mutation point" is also widely referred to as the threshold value).Source-sink pattern is a well-established concept in landscape ecology and is considered to exist widely in nature (Dunning et al., 1992;Furrer & Pasinelli, 2016;Gilroy & Edwards, 2017;Runge et al., 2006).A "sink" is a specific habitat or patch where species cannot grow naturally and rely on external migration (mortality > natality).And a "source" is a habitat in which species can grow and spread naturally (mortality < natality) (Pulliam, 1988).Many studies have shown that identifying sources and sinks is critical for species conservation research, and production-capable source points are key areas that need to be protected (Gervasi et al., 2015;Gilroy & Edwards, 2017;Weegman et al., 2016).We believe that as habitat suitability grows, there is a shift in "sink-source" type at a specific range.This shift is reflected in the P/E curve as an abrupt change in slope at the inflection point.
The method proposed in this paper introduces threshold regression to quantitatively analyze the structural change of the P/E curve and accurately select the mutation point as the threshold.
We apply the threshold obtained from BTQR to classify the predictions of different species distribution models, including the generalized additive model (GAM), generalized boosted regression model (GBM), generalized linear model (GLM), random forest (RF), multivariate adaptive regression splines (MARS), and maximum entropy (MAXENT).We designed a comparative experiment using the real species giant panda Ailuropoda melanoleuca and four virtual species with prevalences of 0.1, 0.25, 0.5, and 0.75 to evaluate the classification effects of BTQR and other threshold methods.The empirical study results demonstrate that BTQR has a broader applicability, relatively high accuracy, and greater consistency in changes to the validation dataset.

| The feature of the P/E curve shape
Boyce proposed a method for evaluating species distribution models that requires only presence data (Boyce et al., 2002).The method divides the continuous predictions of the model output into b classes and calculates two metrics for each class: (1) the predicted frequency of validation points P i .(2) the expected probability of validation points E i , that is, the frequency expected from a random distribution across the study area. (1) p i is the number of validation points belonging to class i and ∑ p j is the total number of validation points.a i is the number of grid cells or area covered that belongs to class i and ∑ a j is the total number of grid cells or total area of the entire study area.Finally, for each class i , the predicted-to-expected (P/E) ratio F i is given by By using a moving window with a very small width of W, the habitat suitability values covering the range of [0,1] can be divided into show a monotonic increase.
We searched the Web of Science and Google Scholar platforms for articles from 2008 to 2022 using the keywords "P/E curve," "predicted-expected ratio," "Boyce index," and "species distribution model."Thirteen articles that plotted P/E curves in studies related to species distribution modeling were screened in the search results.
Table 1 summarizes the relevant information from these 13 articles.
The review indicates that there may be a universal pattern in the shape of the P/E curve.Except for one study that classified habitat suitability into four categories and could not reflect the explicit shape information of the curve, all other P/E curves are nonlinear.
The curves in 11 of the papers exhibit typical concave features with mutation points.The slopes of these curves vary within different intervals, with lower values when habitat suitability is low and significantly higher values when habitat suitability is higher than a certain value.

| Ecological explanation of the P/E curve
This paper argues that the wide occurrence of mutation points and slope changes in P/E curves in different studies is not a coincidence.When habitat suitability reaches the threshold, the mutation of curve-shaped characteristics is likely to correspond to the habitat type of the species.We suggest that this transformation can be explained by the source-sink theory.The source-sink theory suggests that the differences in habitat types between source and sink are due to different habitat qualities.And for the correct species distribution model, the predicted value should quantitatively reflect habitat quality (Almasieh & Cheraghi, 2022;Guisan et al., 2017;Rew et al., 2020).There will inevitably be a threshold in the value range of the model outputs.Patches with suitability above this threshold are sources, while those below this threshold are sinks.Although species still seem to present at a (2) very low density in some sink patches, we classify these areas as "non-existence" or "non-habitat."It is due to the fact that, on the one hand, its presence is unstable and not self-sustaining.This corresponds to a common situation in real environments, where a species is present in a nonhabitat area due to migration or random activities.On the other hand, setting the threshold that classifies all presence points as "presence areas" often results in an excessively high false positive rate, which is not conducive to optimizing the distinction between presence and absence.Source patches with suitability above the threshold are deemed "presence" or "habitat."The following will demonstrate that this threshold will cause a significant change in the slope of the P/E curve.
The predicted-expected ratio represents the density of presence points within a certain suitability range, which is positively correlated with the number of individuals in the region.When the suitability is below the threshold, the corresponding habitats are considered sinks.Since all individuals in the sink habitat come from external migration, an increase in suitability within this range will only enhance the attractiveness of the habitat to migrants and cannot increase the natural growth rate.Given that the attractiveness of sink habitats is also restricted by migratory distance and migration resistance, the increase in suitability of sink habitats has a slight effect on the number of species, resulting in a low slope on the P/E curve (Figure 1).When the suitability increases beyond the threshold, the corresponding habitat is the source.The increase in

| Calculating mutation points through threshold regression
In order to accurately identify mutation points in the structure of the P/E curve, this paper introduces a threshold regression method to avoid the arbitrariness of manual interpretation and improve accuracy.Threshold regression is an extension of traditional linear regression widely used in econometrics (Tong, 1983).This method allows coefficients to vary within different regions, which are identified by threshold variables.A grid search method is used to estimate the value of the threshold, which evaluates the sum of the squared residuals for different values of the threshold parameter and chooses the one that minimizes it.
In this paper, we construct a threshold regression for classifying the output results of species distribution models.This formula is written as where pe represents the predicted-expected ratio as the dependent variable, hs represents habitat suitability predicted by the model as an exogenous variable with region-specific coefficients 1 and 2 .t is an independent and identically distributed error with mean of 0 and a variance of 2 .The estimated threshold (̂ ) is one of the values in (4) the threshold variable hs t .To estimate the threshold, we minimize the least squares of the following regression with T observations and two regions, for a sequence of T 1 values in hs t , where T 1 < T. The trimming percentage is set to 10%, which implies that T 1 corresponds to the number of observations between the 10th and the 90th percentile of hs t .The formula for estimating the threshold is where Γ = 0, hs max , is a T 1 × 1 vector of the sum of the squared differences between the observed predicted-expected ratio and the estimated value, and is a T 1 × 1 vector of tentative thresholds.

| MATERIAL S AND ME THODS
We use both virtual species simulations and empirical studies to systematically analyze the validity and applicability conditions of different threshold methods.The virtual species approach can generate simulated distribution data with different prevalences by adjusting the response function to environmental factors, conveniently assessing the sensitivity of threshold methods to different combinations of model and species traits.And the spatially explicit simulated data also make it possible to compare the classified binary output to a known (virtual) "truth" (Miller, 2014).Most of the existing studies used virtual species to evaluate the effectiveness of threshold selection methods (Li & Guo, 2013;Liu et al., 2013Liu et al., , 2016)).However, given that artificially set affecters and response functions can hardly depict the complex and ambiguous relationships between species and natural environments, assessment with only virtual species does not fully reflect the effectiveness of the method in the real world (Zurell et al., 2010).We therefore used the giant panda, a real species, as a supplement to the virtual species simulation.
For real species research, we used data from the Chinese government's 4th Giant Panda Survey on giant panda occurrences and suitable habitat in Sichuan Province, China.The 4th Survey on Giant Panda between 2011 and 2014 covered 42 counties in Sichuan Province, with 13,737 survey lines, including all habitats and potential habitats identified in previous surveys, areas where habitat conditions may be suitable, and areas where pandas may be found based on reports from residents (Jiang et al., 2015).During the sampling process, the survey areas were divided into grids with reference to the activity habits of giant pandas and the size of the nest area.Differences in the travel distance and length of giant panda scat sections were used to distinguish whether the observations were of the same panda.The above design minimizes sampling bias in the survey process (Tang et al., 2015).Researchers in this survey also assessed the status of habitat topography, vegetation, food, and co-distributed wildlife resources.The assessment results were used to analyze the suitability for giant panda survival and map the distribution of suitable habitat for giant pandas in Sichuan Province (Tang et al., 2015) (Figure 2).This provided suitable evaluation data for this study to construct confusion matrices against threshold classification results for accuracy assessment.

| Data description
The real species modeling used 910 occurrences marked by the 4th Survey on Giant Panda in Sichuan Province.We chose the same study area for modeling based on virtual and real species.The study area was selected as the main distribution area for wild giant pandas in Sichuan Province, with a total area of approximately 318,175 km 2 at latitudes 27.80° to 34.09° N and longitudes 100.80° to 106.60° E (Figure 2).
For both virtual species and real species, three sets of environ- The boundaries of the environmental variable data layers were unified.
The raster size of all environmental factors was set to 1 km × 1 km by resampling.Principal component analysis was performed on a total of 22 environmental factors to reduce data redundancy and improve data processing efficiency, resulting in three principal component layers.

| Generation of virtual species
The virtual species were created in three steps (Leroy et al., 2016).
First, the environmental suitability was calculated using an additive approach to the response functions of each principal component, where the possible types of response functions implemented are "gaussian," "linear," and "logistic."The obtained environmental suitability was then rescaled between 0 and 1 (i.e., the range of possible probability values of the virtual species distribution).Second, we convert the environmental suitability raster into a probability of occurrence with a logistic transformation setting α and β parameters that determine the shape of the logistic curve.Finally, each grid cell in the landscape was subjected to a Bernoulli trial by simply generating a random number x out of a uniform distribution between 0 and (5) 1, with which to compare its probability Pr.We assigned a pixel presence (y = 1) if x < Pr and absence (y = 0) otherwise, resulting in the realized presence-absence map.
We generated four types of species with different preferences for environmental factors by setting the type, arrangement, and parameter values of the response function (see Appendix S1).To investigate the sensitivity of the thresholding approach to species prevalence, we set the α value of logistic conversion to −.05 and adjusted the β value to set the prevalence of the above species to 0.1, 0.25, 0.5, and 0.75, respectively.

| Model building
In order to compare the effectiveness of the thresholding method for different species distribution models and to ensure the generalizability of the BTQR method, six species distribution models, including the generalized linear model (GLM), generalized additive model (GAM), generalized boosted regression model (GBM), random forest (RF), multivariate adaptive regression splines (MARS), and maximum entropy (MAXENT), were used in this study (see the tuning parameters in Appendix S2).GLM, GAM, GBM, and MARS are regression-based models; RF is an integrated learning model based on decision trees; and MAXENT is a machine learning model that estimates species distribution by finding the distribution with maximum entropy (i.e., closest to geographic homogeneity).In this study, GAM, GBM, GLM, RF, and MARS models were constructed through the Biomod2 package of the R language (Thuiller et al., 2009), and MAXENT models were constructed through Maxent software (Phillips et al., 2006).
For virtual species modeling, we sampled presence points in the created presence-absence map for the training dataset.After excluding the presence points used for modeling, background points were randomly extracted as pseudo-absence data throughout the study area.Based on the recommendations of previous studies, we fixed the ratio of presence to absence points in the training set to 1:2 for models built with absence data (GAM, GBM, GLM, MARS, and RF) (Barbet-Massin et al., 2012;Liu et al., 2016).We constructed training sets for the GAM, GBM, GLM, MARS, and MAXENT models.
For species prevalence of 0.1 or 0.25, we utilised 5000 presence points, whereas for prevalence of 0.5 or 0.75, we employed 10,000 presence points.For the RF model, since using too many random points can lead to a significant decrease in model accuracy, we sampled 50 presence points at species prevalences of 0.1, 0.25, and 100 presence points at species prevalences of 0.5, 0.75.All of the above sampling and training processes were iterated 10 times by independent procedures to reduce sampling bias.
For real species modeling, to reduce spatial autocorrelation between data, we excluded points that were too close in geographic space and screened 300 out of 910 presence points of giant pandas as training data.The resolution to rarefy data was set to 4 km, considering the average nesting area size of giant pandas (Zhu et al., 2010).MAXENT is trained using presence-only data.For GAM, GLM, RF, and MARS, we randomly selected 5000 background data points as pseudo-absence data in the same way as virtual species and combined them with the presence data mentioned above to create a presence-absence dataset for model training.The model creation process for real species is iterated 10 times through 10-fold cross validation.

| Threshold selection
Ten threshold selection methods were applied in this comparative study, including: (1) maximizing the sum of sensitivity and specificity (MSS), (2) equalizing sensitivity with specificity (ESS), (3) maximizing kappa (MaxKappa), (4) Boyce-Threshold Quantile Regression (BTQR), (5) equalizing predicted prevalence with observed prevalence (equalPrev), (6) mean predicted probability (MeanProb), (7) minimizing distance between the receiver operating characteristic (ROC) curve plot and the point (0,1) (MinROC), ( 8) equalizing sensitivity with 0.5 (Se0.5), ( 9) equalizing sensitivity with 0.75 (Se0.75), and (10) equalizing sensitivity with 0.9 (Se0.9).Validation datasets with different numbers of presence points were generated to evaluate the effect of the presence amount on the threshold classification accuracy.For the virtual species, we generated validation datasets with 1000, 2000, 4000, and 8000 presence points by an independent sampling procedure and subsequently sampled 10,000 background points as pseudo-absence data in the dataset.For real species, validation datasets containing 100, 200, 300, and 400 presence points were constructed, with 1000 background points acting as pseudo-absences.Both presence and pseudo-absence data in each group of validation datasets were randomly sampled 10 times to limit the impact of sample bias on the experiment and to test the stability of different thresholding strategies for random validation datasets.Threshold selection methods (1)~(3) use both presence and pseudo-absence data, while threshold selection methods (4)~( 10) are based on presence data only.

| Threshold method assessment
We assessed the accuracy of the threshold methods using kappa values.For virtual species, we created test datasets based on species prevalence.The test dataset for each virtual species was obtained by randomly sampling from previously generated presence-absence maps, including 50,000 p presence points and 50,000 (1−p) absence points (p is the prevalence of virtual species).We compared the classification results of these points with the true labels, calculating the kappa values.For real species, we calculated Kappa values by comparing the binary maps after threshold classification with the habitat distribution maps assessed by the 4th Survey on Giant Panda.
We assessed the consistency of the threshold method in the changes to the validation dataset through the F-test.For each threshold method, we classify the calculated thresholds into four groups based on the validation dataset used.We consider the validation dataset as the independent variable and compute the F-value, which can be used to test the consistency of the means of multiple sets of data.The higher the F-value, the greater the likelihood that the means of threshold groups are different, which also means that the corresponding threshold method is more likely to be affected by the number of presence points in the validation dataset.Considering that the number of groups and samples in the validation datasets set up in the experiments for each species are identical, the F-values of the various types of threshold methods can be directly compared.The p-value obtained from the hypothesis test was used to assess whether there was a difference in the mean values of the threshold groups.

| RE SULTS
Figure 3 illustrates the P/E curves and the corresponding thresholds calculated based on the BTQR method for a moving window resolution of 100.All curves show a monotonically increasing trend.
For species 1 and the giant panda, the shape of the curves is upward concave exponential.For species 2, 3, and 4, the shape of the curves is mostly closer to a sigmoid curve, with a slope that is small at the beginning, then increases abruptly, and finally flattens out again.From species 1 to 4, the relative position of the thresholds calculated by the BTQR method on the curve's changes with species prevalence but is always located near the point where the slope mutates.The shape of the P/E curve and the position of the thresholds hardly change when, all else being equal, the validation dataset changes.From one randomized experiment to the next, the curve shifts up and down within a certain range, but the shape of the curve corresponding to the same set of models remains essentially unchanged.The number of presence points in the validation dataset

F I G U R E 3
The ratio of predicted-expected (P/E) frequency versus prediction values of the species distribution model.Each column of subplots from left to right corresponds to virtual species 1-4 and real species Giant Panda.Each subplot row from top to bottom corresponds to the models GAM, GBM, GLM, MARS, MAXENT, and RF.Different colors of "presence" correspond to different numbers of presence points in the validation dataset.For virtual species, Presence 1-4 corresponds to the number of presence points: 1000, 2000, 4000, and 8000.
For real species, Presence 1-4 corresponds to the number of presence points: 100, 200, 300, and 400.The light ribbon reflects the range of fluctuations in P/E across all stochastic realizations at the 90 percent confidence level.The dark solid line in the middle of the light band is the mean value of P/E.The dashed line perpendicular to the x-axis represents the mean threshold calculated by the BTQR method.
has an effect on the magnitude of the fluctuations; the greater the number of presence points, the smaller the range of fluctuations in the curve.The accuracy of traditional approaches changes significantly when the species prevalence and number of presence points in the validation dataset transition.For EqualPrev, if the quantity of presence points in the validation set is limited, the kappa value is low, but it grows dramatically as the number of presence points increases.The ESS approach performs reasonably well at species prevalences of 0.1 and 0.25, but poorly at higher species prevalences.For MaxKappa, the kappa value decreases with the number of presence points when the species prevalence is 0.1, whereas it increases with the number of presence points at other species prevalences.The kappa value of MeanProb increases with species prevalence.For MinROC, the kappa value is higher at species prevalences of 0.25 and 0.5 and lower at species prevalence of 0.1 and 0.75.The kappa value of MSS is low at a species prevalence of 0.1 and otherwise performs well.For the three fixed sensitivity threshold methods, Se0.5 and Se0.75 are more accurate at low species prevalence, while Se0.9 performs effectively at high species prevalence.For the real species giant panda, the volatility in the accuracy ranking of traditional threshold methods can be affected by the modeling method; for example, EqualPrev possesses a relatively high kappa value on the GAM, GBM, GLM, and MARS models but has low accuracy on the MAXENT and RF models.The performance of all types of threshold methods on giant pandas is generally closer to Species 1, which may be due to the low prevalence of giant pandas as a vulnerable species in reality.However, there are exceptions; for example, the kappa value of ESS is high for Species 1 but low for the giant panda.The kappa value of BTQR, while not exceeding those of other methods in every assessment, stabilizes at a high level in the vast majority of cases (except for the RF model at a species prevalence of 0.75).Unlike traditional methods, whose accuracy declines sharply under some conditions, the accuracy of the BTQR method is virtually unaffected by species prevalence, the modeling approach, or the validation dataset.
Figure 6 compares the species prevalence estimated based on the various models and threshold methods with the true species prevalence.At a species prevalence of 0.1, the BTQR estimate yields a prevalence slightly higher than the true value, while Se0.5 is close to the true value.Nevertheless, the BTQR estimation is not significantly distant from the true prevalence at this time, and its accuracy is second only to Se0.5.At species prevalences of 0.25 and 0.5, the predicted prevalence based on BQTR is nearly identical to the true prevalence.At a species prevalence of 0.75, the BTQR method performs effectively on all models except RF.Overall, the BTQR method is able to calculate species prevalence with top-ranked accuracy in most scenarios.In contrast, traditional methods, while some can perform well under specific conditions, are unable to adapt to the situation after changes in species and validation datasets..906, .924, .501, and .467,respectively.The thresholds computed by the BTQR have a relatively high degree of consistency among the various sets of thresholds, indicating that the BTQR approach is resistant to the variation of the validation dataset.

| Comparison of threshold selection methods
There are two main purposes for transforming continuous predictions of species distribution models into binary outputs through thresholds.The first is to perform the most accurate habitat classification possible.This accuracy can be evaluated by two independent criteria.On the one hand, the classified habitat should contain as many presence sites as possible and exclude as many absence sites as possible.On the other hand, the classified habitat should be close to the habitat range obtained from reliable actual surveys.Previous research employing just virtual species could only evaluate the threshold method based on the first criterion.This study, however, used real giant panda distribution data in addition to virtual species, and thus the effectiveness of the threshold method could be evaluated on two levels simultaneously.The comparison result of the kappa value shows that the commonly used traditional threshold methods only perform well under certain conditions.When relevant factors such as species prevalence, validation datasets, and modeling methods are changed, the accuracy of these methods fluctuates considerably.
In contrast, the BTQR method proposed in this paper maintains high accuracy under almost all combinations of conditions.And the kappa value of the BTQR method can be ranked among the top of all methods for both virtual and real species.
The second purpose was to assess species-related parameters with the help of the binarized result.We generated virtual species The average threshold values are estimated by different threshold methods.The number at the top of each column represents the quantity of presence points in validation datasets.For virtual species, Presence 1-4 corresponds to the number of presence points: 1000, 2000, 4000, and 8000.For real species, Presence 1-4 corresponds to the number of presence points 100, 200, 300, and 400.Species 1-4 correspond to species prevalences of 0.1, 0.25, 0.5, and 0.75 in order.

F I G U R E 5
The box plot of kappa values for models classified by different threshold methods with the given validation dataset, "Presence 1."For virtual species 1-4, the number of presence points is 1000.For the real species giant panda, the number of presence points is 100.The models corresponding to each row are, in turn, GAM, GBM, GLM, MARS, MAXENT, and RF.Box plots of the corresponding kappa values for validation datasets 2-4 are found in Appendix S3.

F I G U R E 6
The line plots of species prevalence were estimated based on different models and threshold methods.The number at the top of each column indicates the number of presence points in the validation dataset, from left to right: 1000, 2000, 4000, and 8000.Each row corresponds to a virtual species with a specific prevalence.Species 1-4 correspond to species prevalences of 0.1, 0.25, 0.5, and 0.75 in order.The black dashed lines correspond to the true prevalence of each species.The colors of the line correspond to different threshold methods.The horizontal coordinates of the solid points correspond to different models. | with defined species prevalence.By comparing the gap between the threshold estimated prevalence and the true prevalence, we evaluated the effectiveness of each threshold method in predicting this parameter.In most instances, the difference between the BTQR projected prevalence and the real prevalence was rather minimal, except when applying the RF model at 0.75 species prevalence.Other traditional approaches, on the other hand, might be considerably impacted by species prevalence, validation datasets, and modeling methods.The validity of these methods is inconsistent, and their applicability is significantly narrower than that of BTQR.
Liu has summarized three criteria (objectivity, equality, and discriminability) for evaluating threshold methods when only presence data are available (Liu et al., 2013).First, thresholds should be objective rather than arbitrary (e.g., thresholds like MSS and ESS that optimize a statistical indicator).BTQR meets this criterion because it objectively selected the threshold using threshold regression, which determines the mutation point on the curve by minimizing the sum of residual squares.Second, whether we are using presence-absence data or presence-only data, the threshold should be identical.As a threshold method that only requires the use of presence data, BTQR satisfies this requirement.
Third, the distinction between presence and absence rather than between presence and random point should be optimized.Our experiments have shown that the threshold of BTQR can obtain higher kappa values, meeting the criterion.
In addition to the existing principles, we suggest that a reliable threshold should be stable when using different random validation data.We divided the thresholds obtained by each method into four groups based on the validation dataset used, then performed

| Relationship between BTQR and species prevalence, modeling methods
Although the BTQR method generally showed good stability and accuracy across species prevalence, the line shape of the P/E curves and the relative position of the estimated thresholds showed differences.At a species prevalence of 0.1, most of the P/E curves showed an exponential shape, with the BTQRestimated thresholds at the inflection points of the curves, and the slopes on the right side of the thresholds were substantially higher compared to the left side.At species prevalence levels of 0.25, 0.5, and 0.75, the P/E curves are more similar to the sigmoid curve.Most of the BTQR threshold positions at high species prevalence will be closer to the end of the curve, and the slope of the right side of the threshold will even be lower than the left side for some models (e.g., GLM at 0.75 prevalence).These differences suggest that the BTQR method, although possessing generality, is not a perfect fit for all species at all prevalence levels.For species with different prevalences, adjusting the parameters and objective function according to their curve characteristics may lead to more accurate thresholds.
The shape of the P/E curve is also affected by the modeling method.Unlike other models, the P/E curves corresponding to the RF model are generally gentler, and the inflection points are not obvious.At a species prevalence of 0.75, the shape of the curve was even concave downward.In this case, the accuracy of the thresholds estimated by BTQR was low, indicating that BTQR could not be applied to this scenario.We suggest that this difference in shape may be due to variations in the decision-making process of the RF algorithm and the regression model.

| CON CLUS IONS
In conclusion, the Boyce-Threshold Quantile Regression proposed in this study is a promising method for threshold selection when only presence data are available.The process of selecting thresholds for BTQR is entirely based on objective statistical analysis of presence data, without personal judgment or artificial pseudo-absence involved.Experiments based on virtual and real species show that the BTQR method can be applied to most combinations of species prevalence and modeling methods.And it is almost unaffected by the number of presence points in the validation dataset.Compared to the current approaches, BTQR is superior in terms of accuracy, applicability, and consistency.
In the present study, we assessed the effectiveness of the BTQR method using five species and six modeling approaches.However, this assessment remains incomplete, and some novel modeling methods, such as INLA-SPDE (Della Rocca & Milanesi, 2022a, 2022b), were not included in the experiments.In addition, the P/E curves at different species prevalences indicate that there is room for further improvement and refinement of the BTQR method.For species with different prevalences, adjusting the BTQR parameters and objective function according to their curve characteristics may yield more accurate thresholds.
suitability of the source habitat can simultaneously lead to an increase in natural growth rate and attractiveness to migrants.The change in this increasing mechanism greatly increases the correlation coefficient between suitability and presence quantity, causing a significant increase in the slope value of the P/E curve on the right side of the threshold.When suitability reaches a high value, the main constraints limiting the population of the species in the region shift from external environmental factors to intraspecific competition and the animal's reproductive capacity, at which point the P/E ratio no longer increases with suitability and the slope of the curve slows down.
E 1 Common pattern of P/E curves summarized from existing literature with a classification framework based on source-sink theory.The solid black line represents the P/E ratio, the solid blue line represents the growth rate, and the orange dashed line represents the threshold corresponding to the curve mutation point.
predictor variables were used in this study: climatic data, topographic data, and population density data.The climatic data were obtained from the global climate dataset provided by the WorldClim website (at https:// www.world clim.org/ data/ ) at a resolution of 30″.This dataset consists of 19 climate factors related to biological activity, such as monthly temperature (minimum, maximum, and average), precipitation, solar radiation, barometric pressure, and wind speed(Fick & Hijmans, 2017).Topographic data are composed of digital elevation models (DEM) and slope, with DEM data from the ASTER GDEM V3 dataset published by NASA(Abrams et al., 2020) and slope calculated from elevation data.Population density data were obtained from the 1-km resolution population density data published on the WorldPop website (at https:// www.world pop.org/ datac atalog/ )(Tatem, 2017).

F
I G U R E 2 Suitable Habitat (green patches) and Location of Giant Pandas (blue dots) Recorded by the 4th Survey on Giant Panda in Sichuan Province, China.

Figure 4
Figure 4 illustrates the average threshold values estimated by the different threshold methods.It can be noticed that when the species prevalence and modeling methods change, the value and relative rank of the thresholds calculated by BTQR also change.But when other conditions are fixed, the threshold values estimated by BTQR are almost unaffected by changes in the number of presence points in the validation dataset.

Figure 5
Figure 5 and Appendix S3 show the distribution of kappa for different threshold methods after classifying the predictions in multiple scenarios with different species, models, and validation datasets in combination with each other.The value of kappa reflects the accuracy of the thresholding methods.

Figure 7
Figure 7 illustrates the F and p values corresponding to each type of threshold method when species prevalence changes.MeanProb, MaxKappa, and EqualPrev have F-values in the thousands, considerably higher than the other thresholding methods.Their p-values are also much less than .01.It can be assumed that the thresholds predicted by these three methods are statistically significantly different when the validation datasets are different.The F-values for the BTQR method are much less than one in the real species case, and for species prevalence of 0.1, 0.25, and 0.5, with p-values of

F
-tests.The results show that the thresholds obtained by MeanProb, MaxKappa, and EqualPrev are significantly different when the validation dataset is changed.For thresholds calculated by MSS and Se0.9, the resistance to the variation of the validation dataset depends on species type.For example, thresholds of Se0.9 keep steady at 0.1 species prevalence and vary between each group at 0.2 species prevalence.The threshold consistency F I G U R E 7 Histograms of the F-value and p-value for each type of threshold method at different species prevalences.The numbers 1, 2, 3, and 4 on the rightmost side of the image represent the species prevalences of 0.1, 0.25, 0.5, and 0.75, respectively.obtained for Se0.75, Se0.5, ESS, MinROC, and BTQR was relatively high in all species instances.

F
1 ∕ W classes.A smooth P/E curve can be plotted against the average suitability value of each class.If the species distribution model correctly predicts species habitat, F i should be positively correlated with the model prediction value, and the P/E curve is expected to